The Corpus for Idiolectal Research (CIDRE)
نویسندگان
چکیده
The Corpus for Idiolectal Research (CIDRE) is a collection of fiction works from 11 prolific 19th-century French authors (4 women, 7 men; 22–62 works/author; total 37 million words). Every work dated with the year it was written. Using programming scripts, have been gathered open source platforms, example La Bibliothèque électronique du Québec, and stripped paratext (text not being part novel, e.g. prefaces). We distribute text files, dating, other metadata scripts under an license. CIDRE first resource study style idiolect in diachronic manner (i.e. stylochronometry) on larger scale.
منابع مشابه
Slavonic Corpus for Stylometry Research
Stylometry techniques such as authorship recognition, machine translation detection and pedophile identification are daily used in applications for the most widely used languages. But under-represented languages lack data sources usable for stylometry research. In this paper, we propose an algorithm to build corpora containing meta-information required for stylometry experiments (author informa...
متن کاملthe search for the self in becketts theatre: waiting for godot and endgame
this thesis is based upon the works of samuel beckett. one of the greatest writers of contemporary literature. here, i have tried to focus on one of the main themes in becketts works: the search for the real "me" or the real self, which is not only a problem to be solved for beckett man but also for each of us. i have tried to show becketts techniques in approaching this unattainable goal, base...
15 صفحه اولSome Experiments on Idiolectal Differences among Speakers
It is generally recognized that human listeners can distinguish between speakers who are familiar to them far better than those who are unfamiliar. This increased ability is due no doubt to speaker idiosyncrasies that are recognized by the listener, either consciously or unconsciously. These speaker characteristics offer the possibility to significantly improve automatic speaker recognition per...
متن کاملCidre: programming with distributed shared arrays
A programming model that is widely approved today for large applications is parallel programming with shared variables. We propose an implementation of shared arrays on distributed memory architectures: it provides the user with an uniform addressing scheme while being e cient thanks to a logical paging technique and optimized communication mechanisms.
متن کاملA speech corpus for multitalker communications research.
Several recent experiments at the Air Force Research Laboratory have investigated the utility of spatial audio displays for augmenting speech intelligibility in multitalker communications environments ~Bolia et al., 1999; Nelson et al., 1998a; Nelson et al., 1998b; Simpson et al., 1999!. Some of the goals of this research included: ~1! an empirical determination of the maximal number of channel...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of open humanities data
سال: 2021
ISSN: ['2059-481X']
DOI: https://doi.org/10.5334/johd.42